{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 元组，集合，字典"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在详细的学习过 *list* 之后，剩下的三种就简单多了，因为很多东西都是类似甚至一样的。这一章的重点是掌握**元组**（*tuple*）、**集合**（*set*）和**字典**（*dictionary*）的独特性以及常见的应用场景。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tuples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Tuple* 是一个造出来的数学名词，中文可以翻译为”元组“，几乎没有意义。\n",
    "\n",
    "*Tuple* 和 *list* 非常类似，实际上 *tuple* 和 *list* 都属于序列（*sequence*）类型的数据结构，即包含的元素是有序的。\n",
    "\n",
    "在 Python 中 *tuple* 用圆括号括起来表示（区别于 *list* 的方括号）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 24, 'hello')\n"
     ]
    }
   ],
   "source": [
    "t1 = (1, 24, 'hello')\n",
    "print(t1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tuple"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(t1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其实圆括号可以省略，下面这句和上面的完全等价："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 24, 'hello') (1, 24, 'hello')\n"
     ]
    }
   ],
   "source": [
    "t2 = 1, 24, 'hello'\n",
    "print(t1, t2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意，*tuple* 只有一个元素时，最后必须跟一个逗号，这和 *list* 是不同的："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'tuple'>\n"
     ]
    }
   ],
   "source": [
    "t3 = (1,)\n",
    "print(type(t3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "与 *list* 类似，*tuple* 也用下标来访问其中元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t1[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Tuple* 里的元素本身也可以是 *tuple*，也就是嵌套的 *tuple*："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "((1, 24, 'hello'), (1, 2, 3))\n"
     ]
    }
   ],
   "source": [
    "t = t1, (1, 2, 3)\n",
    "print(t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 不可更改（*immutable*）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Tuple* 和 *list* 的主要区别在于，*tuple* 一经创建不可更改，这种特性叫 *immutable*，这是一种非常苛刻的要求，但是也有很多好处，最主要一点是降低了程序的复杂度，让程序运行更加可控，测试和验证更容易。\n",
    "\n",
    "相反的，*list* 是 *mutable*。\n",
    "\n",
    "如果我们尝试修改 *tuple* 的元素，会出现 `TypeError` 运行时异常（表示访问了特定类型不支持的属性或者方法），例如下面这句："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 如果我们尝试更改 *tuple* 中的元素，会抛出异常\n",
    "# t[0] = 888"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要注意，*tuple* 包含哪些元素及其顺序不可以改变，但 *tuple* 包含的某个元素可能是 *mutable*，其内容是可以改变的，请看下面的例子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "([1, 2], [4, 5])\n",
      "([1, 2, 3], [4, 5])\n"
     ]
    }
   ],
   "source": [
    "a = [1, 2]\n",
    "b = [4, 5]\n",
    "t = a, b\n",
    "print(t)\n",
    "a.append(3)\n",
    "print(t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面这一点有点绕，但也很重要，请务必理解清晰。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pack 和 Unpack"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Tuple* 和 *list* 很像，而且也是 *iterable*，所以我们在上一章章节讲的很多内容也适用于 *tuple*，当然，元素的增删改和排序等操作除外。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "虽然很像，但通常 *tuple* 会用在和 *list* 不一样的场合，两个主要的差别是：\n",
    "* 一般 *list* 包含的元素都是相同类型的（虽然允许是不同类型），而 *tuple* 没有这个约束，经常包含不同类型的元素；\n",
    "* 一般 *list* 用循环和迭代方式来处理，而 *tuple* 被广泛地用在数据“打包（*pack*）”和“解包（*unpack*）”操作中。\n",
    "\n",
    "这一节我们重点看看 *pack* 和 *unpack*。\n",
    "\n",
    "把一组数据（不管什么类型）打包进一个 *tuple* 的操作叫 *pack*："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 24, 'hello', True)\n"
     ]
    }
   ],
   "source": [
    "t = 1, 24, 'hello', True\n",
    "print(t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "前面我们讲过，初始化一个 *tuple* 时可以省略括号，其实也是 *pack*。\n",
    "\n",
    "反过来的操作就是把 *tuple* 中的元素分别赋值给几个变量，就叫 *unpack*。\n",
    "\n",
    "注意左边的变量个数和 *tuple* 的元素个数必须严格相等，多了少了都会抛出 TypeError 异常："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 24 hello True\n"
     ]
    }
   ],
   "source": [
    "start, end, s, f = t\n",
    "print(start, end, s, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Pack* 和 *unpack* 在 Python 中应用广泛，随处可见，下面是几个例子：\n",
    "\n",
    "* 我们在[介绍函数定义的一章](p2-1-function-def.ipynb)里介绍的 *arbitrary argument*  \n",
    "调用时传入的不定个数的值被打包进一个 *tuple*，在函数体中通常用 `for...in` 循环来遍历它。\n",
    "\n",
    "* 函数的多返回值  \n",
    "`return` 语句会把后面的多个返回值打包成一个 *tuple*，调用端用 `a, b = f(x)` 的语法其实就是 *unpack*。\n",
    "\n",
    "* [上一章](p2-8-list.ipynb)中介绍的 `enumerate(iter)` 函数返回的迭代器  \n",
    "里面每个元素都是一个 *tuple*，是迭代序号和 `iter` 里的元素配对组成的 *tuple*。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外一个 *tuple* 和 *list* 的差异是：*tuple* 可用作 *dictionary* 中的 *key*，而 *list* 不可以。我们下面讲 *dictionary* 的时候会举例说明。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sets 集合"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合（*set*）基本就是数学里集合的概念：\n",
    "* 集合里的元素没有顺序；\n",
    "* 集合里没有重复元素。\n",
    "\n",
    "Python 支持数学上集合的基本操作：判断一个集合是否包含某个元素（*belongs to*），两个集合之间的并集（*union*）、交集（*intersection*）、差集（*difference*）和对称差集（*symmetric difference*）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 创建集合和基本操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python 中的集合是用花括号括起来的一组值，一般情况下是相同类型的值。如果我们创建集合时指定了重复的值，Python 解释器会自动去重："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'banana', 'pear', 'orange', 'apple'}\n"
     ]
    }
   ],
   "source": [
    "fruits = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}\n",
    "print(fruits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "set"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(fruits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "操作符 `in` 用于判断某个元素是否属于某个集合："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'orange' in fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'coconut' in fruits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合创建后可以用 `add` `remove` 方法增加和删除元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "print('coconut' in fruits)\n",
    "fruits.add('coconut')\n",
    "print('coconut' in fruits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "print('apple' in fruits)\n",
    "fruits.remove('apple')\n",
    "print('apple' in fruits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合的元素不可重复的特点很适合用来表示分类、关键字之类的概念。集合的实现保证了 `in` 判断效率很高，对很大的集合都能快速给出结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "和 *list* 类似，Python 也提供了一个 `set()` 函数来把字符串、其他数据容器和任何 *iterable* 转换为集合，比如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = set('abracadabra') # => {'a', 'r', 'b', 'c', 'd'}\n",
    "b = set('alacazam') # => {'a', 'z', 'l', 'm', 'c'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合之间可以进行并集（*union*）、交集（*intersection*）、差集（*difference*）和对称差集（*symmetric difference*）四种运算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 并集（union），集合 a 和 b 的元素合并起来的集合，只要在 a 或者 b 中的元素都存在于并集中\n",
    "a | b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'c'}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 交集（intersection），集合 a 和 b 的公有元素的集合，只有既在 a 中也 b 中的元素才存在于交集中\n",
    "a & b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'b', 'd', 'r'}"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 差集（difference），在 a 中但不在 b 中的元素的集合\n",
    "a - b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'b', 'd', 'l', 'm', 'r', 'z'}"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 对称差集（symmetric difference），在 a 或者 b 中但不同时在 a 和 b 中的元素的集合\n",
    "a ^ b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'b', 'd', 'r'}"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 数学公式之一：a - b = a - (a & b)\n",
    "a - (a & b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'b', 'd', 'l', 'm', 'r', 'z'}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 数学公式之二：a ^ b = (a | b) - (a & b)\n",
    "(a | b) - (a & b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 集合的遍历"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "集合也是 *iterable*，所以和 *list* 类似，集合的遍历也有 for-in 循环和 *set comprehension* 两种方式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#0: coconut\n",
      "#1: orange\n",
      "#2: banana\n",
      "#3: pear\n"
     ]
    }
   ],
   "source": [
    "for index, fruit in enumerate(fruits):\n",
    "    print(f'#{index}: {fruit}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'r', 'd'}\n"
     ]
    }
   ],
   "source": [
    "a = {x for x in 'abracadabra' if x not in 'abc'}\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc-hr-collapsed": false
   },
   "source": [
    "## Dictionaries 字典"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python 里的字典（*dictionary*），也叫“哈希表（*hashmap*）”，是另一种常见数据容器，其特征为：\n",
    "* 其中每个元素都是一个 key-value 对，就是一个值（*value*）和它的名字（*key*）；可以通过 key 来快速定位一个元素；\n",
    "* 元素排列没有顺序，甚至没有“排列”的概念，可以看做是“散落”的，我们做的操作都是通过 *key* 来进行的。\n",
    "\n",
    "> *Dictionary* 的实际类型叫 `dict`，所以下面我们也会直接用 *dict*。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 创建字典和读写元素"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python 中的字典是花括号括起来的 key-value 对，每个 key-value 对写作 `key: value`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'cat': 'cute', 'dog': 'furry'}\n"
     ]
    }
   ],
   "source": [
    "pets = {'cat': 'cute', 'dog': 'furry'}\n",
    "print(pets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(pets)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "某种意义上我们可以把 key 看做是 *dict* 的“下标”，我们可以通过 key 来访问对应的值，也可以用 `in` 关键字来判断某个 key 是不是存在于 *dict* 中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cute\n",
      "没找到宠物 turtle 的信息\n"
     ]
    }
   ],
   "source": [
    "def find_pet(name):\n",
    "    if name in pets:\n",
    "        print(pets[name])\n",
    "    else:\n",
    "        print(f'没找到宠物 {name} 的信息')\n",
    "\n",
    "find_pet('cat')\n",
    "find_pet('turtle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以用赋值语句直接给某个 *key* 指定 *value*：\n",
    "* 如果这个 *key* 原来不存在，这个 *key-value* 对会被添加进 *dict*；\n",
    "* 如果这个 *key* 原来就在 *dict* 中，则会修改对应的 *value*。\n",
    "\n",
    "如下例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "没找到宠物 fish 的信息\n",
      "wet\n",
      "furry\n",
      "brave\n"
     ]
    }
   ],
   "source": [
    "find_pet('fish')\n",
    "pets['fish'] = 'wet'\n",
    "find_pet('fish')\n",
    "\n",
    "find_pet('dog')\n",
    "pets['dog'] = 'brave'\n",
    "find_pet('dog')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果要读取一个一个不存在的 *key* 对应的值，会遇到 *KeyError* 运行时异常。所以为了安全的获取 *dict* 中的元素，Python 提供了 `get()` 方法，这个方法需要两个参数，第一个是 `key`，第二个是“缺省值”，也就是如果 `key` 不存在时就会返回缺省值，而不会抛出异常："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n",
      "wet\n"
     ]
    }
   ],
   "source": [
    "# 直接访问一个不存在的 key 会抛出异常\n",
    "# print(pets['monkey'])  # KeyError: 'monkey' not a key of pets\n",
    "\n",
    "# 所以安全的标准方法是用 get() 方法\n",
    "print(pets.get('monkey', 'N/A'))\n",
    "print(pets.get('fish', 'N/A'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以用 `del` 语句类删除 `dict` 中的元素："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "没找到宠物 fish 的信息\n"
     ]
    }
   ],
   "source": [
    "if 'fish' in pets:\n",
    "    del pets['fish']\n",
    "\n",
    "find_pet('fish')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 字典的遍历"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "正如我们前面说过的 *dict* 也是 *iterable*，所以也可以用 `for...in` 和 *dict comprehension* 来遍历。\n",
    "\n",
    "由于 *dict* 的元素是 key-value 对，所以 `for...in` 也提供了两种形式：只遍历 *key* 和遍历 *key-value* 对："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = {'person': 2, 'cat': 4, 'spider': 8}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A person has 2 legs\n",
      "A cat has 4 legs\n",
      "A spider has 8 legs\n"
     ]
    }
   ],
   "source": [
    "for animal in d:\n",
    "    legs = d[animal]\n",
    "    print(f'A {animal} has {legs} legs')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A person has 2 legs\n",
      "A cat has 4 legs\n",
      "A spider has 8 legs\n"
     ]
    }
   ],
   "source": [
    "for animal, legs in d.items():\n",
    "    print(f'A {animal} has {legs} legs')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`dict` 类的 `items()` 方法返回一个列表，里面每个元素是一个 *tuple*，*tuple* 里面是两个元素，对应 *dict* 中的 *key-value* 对。所以上面第二个 `for...in` 语句可以针对 `animal`（*key*）和 `legs`（*value*）两个循环变量进行遍历。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Dict comprehension* 的形式和其他容器也差不多，我们甚至也可以从一个 *list* 出发构造出一个 *dict*："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0: 0, 2: 4, 4: 16}\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}\n",
    "print(even_num_to_square)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dict 和 Tuple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Dict* 和 *tuple* 是一对好搭档，经常配合使用。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一个很常见的设计是这样的：从某些数据源（比如调用某个函数）得到两个有关联的返回值，比如某个人的 id 及其电子邮件地址：\n",
    "\n",
    "```python\n",
    "def get_next_id_and_email():\n",
    "    # 访问在线服务接口或者数据库\n",
    "    # 然后返回获得的用户 id 及 email，注意下面的写法，逗号隔开的两个值，这其实是一个 tuple\n",
    "    return id, email\n",
    "\n",
    "# 可以多次调用上面的函数，得到很多包含 id 和 email 的 tuple，我们将这些 tuple 都加入一个 list\n",
    "while has_more_ids:\n",
    "    t = get_next_id_and_email()\n",
    "    all_ids_and_emails.append(t)\n",
    "    \n",
    "# 最终 all_ids_and_emails 会是这样的内容：[('neo', 'neo@zion.org.net'), ('trinity', 'trinity@zion.org'), ...]\n",
    "# 然后我们就可以用 dict() 函数将这个 list 转换成 dict\n",
    "dict(all_ids_and_emails) # => {'neo': 'neo@zion.org', 'trinity': 'trinity@zion.org'), ...}\n",
    "```\n",
    "\n",
    "上面的伪代码是一种“设计模式（*design pattern*）”，可以应用在很多场景下，其基本逻辑是：\n",
    "1. 把 key 和 value 这一对变量 `pack` 成 `tuple`；\n",
    "2. 把同类的 `tuple` 放进一个 `list`；\n",
    "3. 用 `dict()` 函数将 `list` 转换为 `dict`；\n",
    "4. 通过 key 可以很方便高效地在 `dict` 中查出对应的 value。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "另外，因为 ***dict* 的 *key* 必须是 *immutable***，所以 *tuple* 可用作 *dict* 中的 *key*，而 *list* 不可以。\n",
    "\n",
    "举个例子。假定我们想统计学生们参加体育活动的次数，我们可以用 `(学生名，体育项目)` 作 key，而对应的参加次数作为 value，那么这个 *dict* 大致就是这样的：\n",
    "\n",
    "```python\n",
    "{('张三', '足球'): 3, ('张三', '羽毛球'): 1, ('李四', '足球'): 1, ('王五', '羽毛球'): 2, ('王五', '游泳'): 4, ...}\n",
    "```\n",
    "\n",
    "这里把 `('张三', '足球')` 换成列表 `['张三', '足球']` 是不可以的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dict 与排序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Dict* 和 *list* *tuple* 不一样，其元素没有“顺序”的概念，但是对其元素按照某种规则排序这个需求还是经常有的，这时候我们的做法是：按照规则排序，但结果输出为一个 *list*，这样才能有顺序的概念嘛。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假定我们有一个 *dict* 里面装着几种水果的价钱："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "prices = {'apple': 1.99, 'banana': 0.99, 'orange': 1.49, 'cantaloupe': 3.99, 'grapes': 0.39}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果我们要按照 *key* 来排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('apple', 1.99), ('banana', 0.99), ('cantaloupe', 3.99), ('grapes', 0.39), ('orange', 1.49)]\n"
     ]
    }
   ],
   "source": [
    "print(sorted(prices.items()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为 `dict` 类本身没有顺序概念，也不提供 `sort()` 方法，所以这里我们用了内置函数 `sorted()`，这个函数和上一章我们介绍的 `list` 类的 `sort()` 方法几乎一样。\n",
    "\n",
    "另外我们这里用 *dict* 的 `items()` 方法来把 *dict* 转换成了一个 *list*（在前面 **字典的遍历** 一节介绍过），然后直接排序就可以。\n",
    "\n",
    "> 为什么这里直接排序就可以呢？因为：\n",
    "> * *dict* 的 `items()` 输出的 *list* 里每个元素都是 *tuple*；\n",
    "> * *tuple* 里面两个值，第一个对应 *dict* 中的 *key*，第二个对应 *value*；\n",
    "> * 在排序时比较两个 *tuple* 的大小，就是比较其第一个值的大小，即比较 *key* 的大小。\n",
    "> \n",
    "> 你可以试试写几行代码来验证上面这几条事实。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果我们要按照 *value* 来排序，那么就需要在比较上述 *tuple* 时不要比较 *tuple* 的第一个值（*key*），而是比较第二个值（*value*）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('grapes', 0.39), ('banana', 0.99), ('orange', 1.49), ('apple', 1.99), ('cantaloupe', 3.99)]\n"
     ]
    }
   ],
   "source": [
    "print(sorted(prices.items(), key=lambda x:x[1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这次我们指定的 `key` 函数是一个匿名函数，用于在比较两个 *tuple*（`lambda` 表达式中的 `x`）时以其第二个值（`lambda` 表达式中的 `x[1]`）为准。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当然我们也可以指定逆序排列，只要在 `sorted()` 函数调用时指定 `reverse=True` 即可："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('cantaloupe', 3.99), ('apple', 1.99), ('orange', 1.49), ('banana', 0.99), ('grapes', 0.39)]\n"
     ]
    }
   ],
   "source": [
    "print(sorted(prices.items(), key=lambda x:x[1], reverse=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这样我们就可以对 *dict* 进行排序并把结果输出为一个列表，里面用一个个 *tuple* 来表示 *dict* 中的 *key-value* 对。\n",
    "\n",
    "那么作为一个复习题，请你思考：假定有一个 *list*，里面每个元素都是一个 *dict*，可以怎么对其进行排序呢？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "比如说，我们的系统里每个用户都有姓名、年龄、身高等信息，每个人的信息存放在类似这样的一个 *dict* 中："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "person1 = {'name': 'Neo', 'age': 30, 'height': 175}\n",
    "person2 = {'name': 'Trinity', 'age': 29, 'height': 169}\n",
    "person3 = {'name': 'Morpheus', 'age': 43, 'height': 180}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后我们把所有用户的信息放在一个列表中："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [person1, person2, person3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以按照姓名来排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'name': 'Morpheus', 'age': 43, 'height': 180}, {'name': 'Neo', 'age': 30, 'height': 175}, {'name': 'Trinity', 'age': 29, 'height': 169}]\n"
     ]
    }
   ],
   "source": [
    "data.sort(key=lambda x:x['name'])\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "也可以按照年龄排序："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'name': 'Trinity', 'age': 29, 'height': 169}, {'name': 'Neo', 'age': 30, 'height': 175}, {'name': 'Morpheus', 'age': 43, 'height': 180}]\n"
     ]
    }
   ],
   "source": [
    "data.sort(key=lambda x:x['age'])\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还可以按照身高，逆序排序（从高到低）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'name': 'Morpheus', 'age': 43, 'height': 180}, {'name': 'Neo', 'age': 30, 'height': 175}, {'name': 'Trinity', 'age': 29, 'height': 169}]\n"
     ]
    }
   ],
   "source": [
    "data.sort(key=lambda x:x['height'], reverse=True)\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这种情况颇为常见，叫做“按照对象的某个属性排序”，有时候我们用 *dict* 来表示一个对象，有时候用自定义类的实例来表示，都可以用上面这样的方法来排序。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在系统学习了 *iterable* 和 *list* 的基础上，这一章介绍了其余三个数据容器：**元组**（*tuple*）、**集合**（*set*）和**字典**（*dict*）的独特性及各自的应用场景。\n",
    "* *Tuple* 是元素不可更改（*immutable*）的有序容器，广泛用于打包和解包（比如调用函数时的参数和返回值）；\n",
    "* *Set* 是元素具有不重复性的无序容器，可以进行各种集合的操作；\n",
    "* *Dict* 是带有名字（*key*）的值（*value*）的无序容器，有一些比较独特的用法。\n",
    "\n",
    "上一章和这一章介绍的四种数据容器都是非常常用的，所以请你务必要把这两章的内容扎实的读懂、搞清，对所有的例子极其背后的原理充分理解和熟悉。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
